AFRL-AFOSR- VA-TR-201 6-0033 


INFERRING  NETWORK  CONTROLS  FROM  TOPOLOGY  USING  THE  CHOMP  DATABASE 


John  Harer 
DUKE  UNIVERSITY 


12/03/2015 
Final  Report 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Air  Force  Research  Laboratory 
AF  Office  Of  Scientific  Research  (AFOSR)/  RTA2 
Arlington,  Virginia  22203 
Air  Force  Materiel  Command 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  the  burden,  to  the  Department  of  Defense,  Executive  Service  Directorate  (0704-0188).  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no 
person  shall  be  subject  to  any  penalty  for  failing  to  comply  with  a  collection  of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ORGANIZATION. 


1.  REPORT  DATE  (DD-MM-YYYY)  2.  REPORT  TYPE 

11-30-2015  Final  Report 


4.  TITLE  AND  SUBTITLE 

INFERRING  NETWORK  CONTROLS  FROM  TOPOLOGY  USING  THE  CHOMP 
DATABASE 


3.  DATES  COVERED  (From  -  To) 

9/1/15-8/31/15 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 

FA9550- 10- 1-0436 


5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 

John  Harer 

Konstantin  Mischaikow 
Sayan  Mukherjee 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Duke  University 
Rutgers  University 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Air  Force  Office  of  Scientific  Research 
Arlington,  Va 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

DISTRIBUTION  A 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 


1 1 .  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


14.  ABSTRACT 

In  this  project  we  developed  extensive  new  capabilities  in  the  area  of  Topological  Data  Analysis  (TDA)  and  it’s  application  to  dynamical  systems. 
The  role  of  this  work  in  the  Complex  Networks  program  is  based  on  the  fact  that  dynamics  on  networks  is  a  difficult  subject,  and  our  combined 
expertise  in  dynamics,  topology  and  statistics  has  been  brought  to  bear  on  this  problem.  Among  other  things,  we  developed  a  set  of  new  methods 
for  fast  computation  of  invariants  from  TDA.  found  new  applications  of  TDA  to  time  series  datasets,  made  significant  contributions  to  the 
integration  of  statistics  and  Topology,  found  when  certain  statistics  are  sufficient  for  shape  characterization,  and  constructed  a  set  of  new  invariants 
for  multi-scale  analysis  of  dynamical  systems. 


16.  SECURITY  CLASSIFICATION  OF: 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

u 

u 

u 

17.  LIMITATION  OF 
ABSTRACT 


18.  NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 

OF  John  L  Harer 

PAGES  - 

19b.  TELEPHONE  NUMBER  (Include  area  code) 

1  919-660-2845 


Standard  Form  298  (Rev.  8/98) 

Prescribed  by  ANSI  Std.  Z39.18 
Adobe  Professional  7.0 


INSTRUCTIONS  FOR  COMPLETING  SF  298 


1.  REPORT  DATE.  Full  publication  date,  including 
day,  month,  if  available.  Must  cite  at  least  the  year  and 
be  Year  2000  compliant,  e.g.  30-06-1998;  xx-06-1998; 
xx-xx-1 998. 

2.  REPORT  TYPE.  State  the  type  of  report,  such  as 
final,  technical,  interim,  memorandum,  master's  thesis, 
progress,  quarterly,  research,  special,  group  study,  etc. 

3.  DATES  COVERED.  Indicate  the  time  during  which 
the  work  was  performed  and  the  report  was  written, 
e.g.,  Jun  1997  -  Jun  1998;  1-10  Jun  1996;  May  -  Nov 
1998;  Nov  1998. 

4.  TITLE.  Enter  title  and  subtitle  with  volume  number 
and  part  number,  if  applicable.  On  classified 
documents,  enter  the  title  classification  in  parentheses. 

5a.  CONTRACT  NUMBER.  Enter  all  contract  numbers 
as  they  appear  in  the  report,  e.g.  F33615-86-C-5169. 

5b.  GRANT  NUMBER.  Enter  all  grant  numbers  as 
they  appear  in  the  report,  e.g.  AFOSR-82-1234. 

5c.  PROGRAM  ELEMENT  NUMBER.  Enter  all 
program  element  numbers  as  they  appear  in  the  report, 
e.g.  61 1 01  A. 

5d.  PROJECT  NUMBER.  Enter  all  project  numbers  as 
they  appear  in  the  report,  e.g.  1 F665702D1 257;  ILIR. 

5e.  TASK  NUMBER.  Enter  all  task  numbers  as  they 
appear  in  the  report,  e.g.  05;  RF0330201 ;  T41 1 2. 

5f.  WORK  UNIT  NUMBER.  Enter  all  work  unit 
numbers  as  they  appear  in  the  report,  e.g.  001 ; 
AFAPL304801 05. 

6.  AUTHOR(S).  Enter  name(s)  of  person(s) 
responsible  for  writing  the  report,  performing  the 
research,  or  credited  with  the  content  of  the  report.  The 
form  of  entry  is  the  last  name,  first  name,  middle  initial, 
and  additional  qualifiers  separated  by  commas,  e.g. 
Smith,  Richard,  J,  Jr. 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND 
ADDRESS(ES).  Self-explanatory. 


8.  PERFORMING  ORGANIZATION  REPORT  NUMBER. 

Enter  all  unique  alphanumeric  report  numbers  assigned  by 
the  performing  organization,  e.g.  BRL-1234; 
AFWL-TR-85-401 7-Vol-21  -PT-2. 

9.  SPONSORING/MONITORING  AGENCY  NAME(S) 
AND  ADDRESS(ES).  Enter  the  name  and  address  of  the 
organization(s)  financially  responsible  for  and  monitoring 
the  work. 

10.  SPONSOR/MONITOR'S  ACRONYM(S).  Enter,  if 
available,  e.g.  BRL,  ARDEC,  NADC. 

11.  SPONSOR/MONITOR'S  REPORT  NUMBER(S). 

Enter  report  number  as  assigned  by  the  sponsoring/ 
monitoring  agency,  if  available,  e.g.  BRL-TR-829;  -215. 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT.  Use 

agency-mandated  availability  statements  to  indicate  the 
public  availability  or  distribution  limitations  of  the  report.  If 
additional  limitations/  restrictions  or  special  markings  are 
indicated,  follow  agency  authorization  procedures,  e.g. 
RD/FRD,  PROPIN,  ITAR,  etc.  Include  copyright 
information. 

13.  SUPPLEMENTARY  NOTES.  Enter  information  not 
included  elsewhere  such  as:  prepared  in  cooperation 
with;  translation  of;  report  supersedes;  old  edition  number, 
etc. 

14.  ABSTRACT.  A  brief  (approximately  200  words) 
factual  summary  of  the  most  significant  information. 

15.  SUBJECT  TERMS.  Key  words  or  phrases  identifying 
major  concepts  in  the  report. 

16.  SECURITY  CLASSIFICATION.  Enter  security 
classification  in  accordance  with  security  classification 
regulations,  e.g.  U,  C,  S,  etc.  If  this  form  contains 
classified  information,  stamp  classification  level  on  the  top 
and  bottom  of  this  page. 

17.  LIMITATION  OF  ABSTRACT.  This  block  must  be 
completed  to  assign  a  distribution  limitation  to  the  abstract. 
Enter  UU  (Unclassified  Unlimited)  or  SAR  (Same  as 
Report).  An  entry  in  this  block  is  necessary  if  the  abstract 
is  to  be  limited. 


Standard  Form  298  Back  (Rev.  8/98) 


Final  Report 


John  Harer 
Duke  University 

Konstantin  Mischaikow 
Rutgers  University 

Sayan  Mukherjee 
Duke  University 

November  30,  2015 


1  John  Harer  Research 

In  order  to  analyze  networks  dynamically,  we  developed  a  variety  of  tools  for  applying  and 
computing  Topological  Data  Analysis  invariants. 

1.1  SWIPerS 

A  significant  development  during  the  project  was  the  development  of  a  theoretical  frame¬ 
work  and  applications  for  the  topological  study  of  time  series  data  [21].  Broadly  speaking, 
we  have  studied  the  geometrical  and  topological  properties  of  sliding  window  embeddings, 
as  seen  through  the  lens  of  persistent  homology.  In  particular,  we  have  shown  that  maxi¬ 
mum  persistence  at  the  point-cloud  level  can  be  used  to  quantify  periodicity  at  the  signal 
level,  proven  structural  and  convergence  theorems  for  the  resulting  persistence  diagrams, 
and  derived  estimates  for  their  dependency  on  window  size  and  embedding  dimension.  We 
have  also  applied  this  methodology  to  quantifying  periodicity  in  synthetic  data  sets,  and 
compared  the  results  with  those  obtained  using  state-of-the-art  methods  in  gene  expres¬ 
sion  analysis.  We  call  this  new  method  SWIPerS,  which  stands  for  Sliding  Windows  and 
1-dinrensional  Persistence  Scoring.  More  recently  we  have  used  SWIPerS  to  study  physio¬ 
logical  data  and  to  look  for  common  patterns  in  signals.  This  latter  task  is  a  major  effort 
in  all  of  our  work  for  the  Air  Force  as  we  see  it  as  a  revolutionary  way  to  study  hidden 
patterns  and  discover  interesting  structure  for  multi-INT  data. 
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1.2  Frechet  Means  of  Persistence  Diagrams 

In  order  to  use  persistence  diagrams  as  a  true  statistical  tool,  it  is  very  useful  to  have  a  good 
notion  of  mean  and  variance  for  a  set  of  diagrams.  On  this  project  Harer  and  collaborators 
Mileyko,  Turner,  Mukherjee,  Bendich,  Mattingly  and  Munch  showed  that  such  a  mean 
and  variance  exist  and  developed  algorithms  to  compute  them  [17],  [24],  This  work  was 
also  extended  this  work  by  altering  the  original  definition  of  Frechet  mean  so  that  it  now 
becomes  a  probability  measure  on  the  set  of  persistence  diagrams;  in  a  nutshell,  the  mean 
of  a  set  of  diagrams  is  a  weighted  sum  of  atomic  measures,  where  each  atom  is  itself  the 
Frechet  mean  persistence  diagram  of  a  perturbation  of  the  input  diagrams  [19].  We  showed 
that  this  new  definition  defines  a  Holder  continuous  map  from  the  product  k  copies  of  D, 
the  persistence  diagram  space  to  the  space  of  probability  measures  on  D,  and  that  it  is  an 
extremely  statistic  on  vineyards. 

1.3  Cover  Trees  for  Fast  Computation  of  Persistence 

The  theory  of  cover  trees  can  be  applied  to  a  dataset  to  construct  approximations  for 
algorithms  such  as  k  nearest  neighbors.  Sheehy  showed  that  these  can  also  be  used  to 
give  approximations  to  the  Rips  complex  for  a  dataset  with  the  result  that  the  number 
of  simplices  that  have  to  be  considered  is  linear  in  the  number  of  points  [22],  In  ad¬ 
dition,  Kerber  and  others  now  have  fast  algorithms  for  reducing  a  persistence  matrix: 
https://code.google.eom/p/phat/.  Combined  with  a  fast  algorithm  for  finding  cliques  in  a 
graph,  this  almost  gives  a  very  fast  method  for  computing  persistence.  What  is  missing 
is  the  construction  of  the  matrix  to  reduce.  We  developed  a  fast  algorithm  for  doing  this, 
based  on  a  sophisticated  hashing  algorithm. 

1.4  Tracking 

We  also  applied  persistence  to  the  analysis  of  tracking  information  based  on  activity  of 
agents  under  surveillance. 

2  Konstantin  Mischaikow  Research 

Given  an  explicit  continuous  dynamical  system  generated  by  a  nonlinear  map  or  differen¬ 
tial  equation  the  robust  dynamics  can  be  extracted  with  mathematical  rigor  via  a  finite 
set  of  computations.  There  are  four  essential  steps  to  this  procedure:  (i)  discretization  of 
phase  space  via  a  grid,  (ii)  construction  of  a  multivalued  map  or  directed  graph  defined 
on  the  grid  that  provides  an  outer  approximation  of  the  dynamics  of  interest,  (iii)  graph 
algorithms  to  decompose  the  dynamics,  and  (iv)  computation  of  algebraic  topological  in¬ 
variants  such  as  the  Conley  index  using  the  directed  graph  information.  Each  of  these 
steps  has  a  well  defined  relation  to  the  original  dynamics.  The  choice  of  grid  in  step  (i) 
determines  the  resolution  at  which  the  variables  of  the  dynamical  system  is  measured.  The 


2 


numerical  bounds  used  to  determine  the  outer  approximation  in  step  (ii)  provide  bounds 
on  the  precision  of  the  model  used  to  generate  the  dynamical  system  or  the  size  of  the 
noise  associated  with  the  dynamical  system.  There  are  different  approaches  to  the  decom¬ 
positions  in  step  (iii)  and  they  result  in  information  about  the  gradient-like  or  recurrent 
like  structure  of  the  system.  Step  (iv)  is  used  to  provide  a  mathematical  guarantee  that 
the  dynamics  identified  by  the  numerical  computations  are  in  fact  true  solutions  and  not 
numerical  artifacts.  Recent  examples  of  how  these  ideas  fit  together  in  the  context  of  mul¬ 
tiparameter  systems  can  be  found  in  [1,  2],  We  have  also  demonstrated  that  it  is  possible  to 
use  these  ideas  to  efficiently  compute  mathematically  rigorous  global  Lyapunov  functions 
for  nonlinear  systems  using  these  methods  [5]. 

We  are  working  on  extending  these  ideas  and  techniques  to  the  setting  of  time  series 
data. 

2.1  Homology  computations 

Using  ideas  from  discrete  Morse  theory  we  have  developed  a  preprocessing  technique  that 
in  practice  greatly  reduces  the  size  of  the  complexes  on  which  the  computation  of  homology 
has  to  be  performed.  We  use  this  technique  to  compute  the  induced  map  on  homology  as 
follows.  Given  an  outer  approximation  with  acyclic  values  of  a  continuous  function,  the 
projection  map  from  the  graph  of  the  multivalued  map  to  the  domain  of  the  function  induces 
an  isomorphism  on  homology.  Using  the  Morse  preprocessing  one  computes  generators  of 
homology  on  the  domain  and  the  range.  The  induced  map  on  homology  is  then  obtain  by 
composing  on  the  level  of  homology  the  inverse  of  the  projection  on  the  domain  with  the 
projection  on  the  range.  The  advantage  of  this  approach  is  that  the  size  of  the  complexes 
in  both  the  domain  and  the  range  is  greatly  reduced.  This  work  has  been  published  [8] 
and  open  source  code  is  provided  [6]. 

The  efficiency  of  the  Morse  preprocessing  is  based  on  reducing  the  original  complex  to 
a  much  smaller  complex.  Simple  experiments  with  the  code  indicate  that  for  complicated 
higher  dimensional  complexes  the  resulting  complex  depends  on  the  order  in  which  the  cells 
of  the  original  complex  are  processed.  What  is  clear  from  our  experimental  work  is  that 
on  average  we  obtain  optimal  results  by  alternating  the  reduction  process  using  collapse 
and  co-collapse  operations.  We  are  preparing  a  paper  that  explains  this  procedure  and 
discusses  the  experimental  results  [16]. 

With  V.  Nanda,  a  former  Ph.D.  student,  we  have  used  the  above  mentioned  discrete 
Morse  theory  methods  to  develop  an  efficient  preprocessing  algorithm  to  reduce  the  number 
of  cells  in  a  filtered  cell  complex  while  preserving  the  homology  of  the  filtration,  this  implies 
in  particular  that  the  persistent  homology  of  the  reduced  filtration  matches  the  persistent 
homology  of  the  original  filtration.  We  have  successfully  applied  this  technique  to  the 
experimental  and  numerical  data  in  the  context  of  dense  granular  media.  This  work  has 
been  published  [18]  and  open  source  code  is  provided  [20]. 
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In  the  context  of  time  series  data  or  computation  of  outer  approximations  using  non- 
convex  images  we  can  obtain  non-acyclic  valued  cellular  maps.  Thus  the  above  mentioned 
algorithms  do  not  directly  apply.  The  information  at  our  disposal  is  the  graph  of  the 
multivalued  map  T  C  X  x  X ,  the  projection  map  p:  T  — >•  X  onto  the  domain  and  the 
projection  map  q:  T  — >  X  to  the  range.  The  lack  of  acyclicity  implies  that  we  cannot 
assume  p*:H*(T)  — >  H*(X)  is  invertible,  thus  the  minimal  well  defined  information  that 
can  be  extracted  is  G* :  image  p*  — *  g<<  ^erpl  •  We  are  actively  exploring  the  structure  of  this 
map  under  hypothesis  relevant  to  time  series  data  and  what  information  is  contained  in  this 
map  under  these  conditions.  A  paper  detailing  results  has  been  accepted  for  publication 

[7]- 

2.2  Computational  Theory  for  the  Decomposition  of  Nonlinear  Dynam¬ 
ics 

The  classical  theory  of  dynamical  systems  is  based  on  the  existence  and  structure  of  invari¬ 
ant  sets.  However,  since  each  orbit  is  an  invariant  set,  most  physically  interesting  nonlinear 
dynamical  systems  contain  uncountably  many  invariant  sets.  Chaotic  dynamics  and  bifur¬ 
cation  theory  developed  over  the  last  50  years  shows  that  there  is  no  natural  countable  set 
of  invariant  sets  that  one  can  restrict  one  attention  to  describe  global  nonlinear  dynamics. 
Thus,  on  a  fundamental  level  the  classical  formulation  of  nonlinear  dynamic  means  that  it 
is  not  computable. 

We  are  developing  an  alternative  framework  for  nonlinear  dynamics  that  is  explicitly 
based  on  grids  and  outer  approximations  which  at  any  give  level  of  resolution  are  countable. 
We  are  in  the  process  of  comparing  the  computable  structures  in  our  novel  approach 
to  dynamics  with  the  classical  notions  of  Morse  decompositions  and  isolated  invariant 
sets.  A  paper  demonstrating  that  there  are  no  fundamental  obstacles  to  computing  Morse 
decompositions  [9]  has  appear  and  a  paper  demonstrating  that  convergence  to  classical 
structures  is  obtained  by  refining  the  approximation  methods,  i.e.  the  grids  and  the  outer 
approximations  has  been  accepted  [10].  Another  paper  explicitly  relating  the  structure  of 
the  lattice  of  attractors  to  Morse  decompositions  is  in  preparation  [11], 

2.3  Constructing  Multivalued  Cellular  Maps  from  Time  Series  Data 

Given  a  time  series  generated  by  a  continuous  map  from  a  manifold  to  itself  we  construct 
associated  Cech  complexes  and  a  simplicial  map  between  the  complexes.  Extending  the 
ideas  of  P.  Nyogi,  S.  Srnale,  and  S.  Weinberger  to  obtain  a  lower  bound  on  the  probability 
that  a  continuous  selector  of  the  simplicial  map  corresponds  to  the  same  homotopy  type 
of  the  original  map.  In  particular,  this  gives  us  a  lower  bound  on  the  induced  map  on 
homology.  This  work  has  been  published  [3]. 
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2.4  Analysis  of  Time  Series  Data  of  Persistence  Diagrams 

We  are  focussing  on  the  analysis  of  time  series  data  arising  from  spatio-temporally  com¬ 
plex  systems  arising  from  fluid  dynamics,  dense  granular  material,  and  social  networks. 
We  remark  that  for  each  of  these  systems  each  time  point  is  a  high  dimensional  data  set. 
We  reduce  the  information  at  each  time  point  by  computing  a  persistence  diagram  that 
captures  the  essential  geometric  information.  Thus  our  time  series  consists  of  a  series  of 
points  in  the  space  of  persistence  diagrams.  By  studying  the  dynamics  in  the  space  of  per¬ 
sistence  diagrams  we  are  revealing  features  about  otherwise  extremely  high  dimensional 
systems.  We  are  actively  pursuing  different  methods  for  using  metrics  such  as  bottleneck 
and  Wasserstein  on  these  point  clouds  of  persistence  diagrams  to  characterize  the  under¬ 
lying  dynamics.  The  efficiency  of  the  code  [20]  is  critical  for  these  applications  since  we 
literally  need  to  be  able  to  compute  millions  of  persistence  diagrams. 

We  have  applied  these  techniques  to  the  study  of  molecular  dynamics  simulations  of 
systems  of  dense  granular  media  under  compression.  A  paper  showing  that  these  techniques 
distinguish  the  force  network  structures  of  particles  of  dense  granular  media  according  to 
their  frictional  properties  and  the  polydispersity  of  the  particles  has  been  published  [12]. 
We  have  also  published  a  paper  that  provides  a  mathematical  proof  of  the  stability  that 
this  approach  to  the  study  of  time  series  of  particulate  systems  [14] .  That  is  we  show  that 
small  errors  in  the  experimental  or  numerical  protocols  lead  to  small  changes  in  the  results 
reported  by  our  approach.  We  use  these  techniques  to  study  the  spatial  properties  of  the 
temporal  evolution  of  these  systems  [13]. 

We  show  that  similar  ideas  are  applicable  to  the  analysis  of  dynamics  arising  from 
simulations  to  convection  of  fluids  [15]. 

We  have  applied  these  techniques  to  the  study  of  the  dynamics  on  random  geometric 
graphs,  i.e.  graphs  which  are  dominated  by  ’’short”  edges  that  arise  from  an  underly¬ 
ing  geometric  structure  but  also  have  a  random  ’’long”  edges.  We  show  that  for  certain 
ranges  noise  and  the  thresholds  for  propagation  of  information  through  the  network,  these 
topological  methods  can  be  used  to  recover  the  structure  of  the  underlying  network  [23]. 

Though  not  in  the  context  of  time  series  data  we  have  successfully  applied  ideas  from 
persistent  homology  to  the  analysis  of  compressibility  of  proteins  [4], 

3  Sayan  Mukherjee  Research 

3.1  Consistency  of  maximum  likelihood  estimation  for  some  dynamical 
systems 

In  a  paper  that  will  appear  in  the  Annals  of  Statistcis  the  following  results  were  shown: 

We  consider  the  asymptotic  consistency  of  maximum  likelihood  parameter  estimation 
for  dynamical  systems  observed  with  noise.  Under  suitable  conditions  on  the  dynamical 
systems  and  the  observations,  we  show  that  maximum  likelihood  parameter  estimation  is 
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consistent.  Our  proof  involves  ideas  from  both  information  theory  and  dynamical  systems. 
Furthermore,  we  show  how  some  well-studied  properties  of  dynamical  systems  imply  the 
general  statistical  properties  related  to  maximum  likelihood  estimation.  Finally,  we  exhibit 
classical  families  of  dynamical  systems  for  which  maximum  likelihood  estimation  is  consis¬ 
tent.  Examples  include  shifts  of  finite  type  with  Gibbs  measures  and  Axiom  A  attractors 
with  SRB  measures. 

This  is  the  first  paper  that  gives  rigorous  results  for  consistency  of  parameter  estimation 
for  deterministic  dynamical  systems  with  noise.  This  extends  recent  results  for  hidden 
Markov  models.  The  challenge  is  that  one  needs  to  use  ergodic  theory  instead  of  stochastic 
process  theory  to  prove  the  results. 

The  main  result  of  the  paper  is: 

Suppose  that  we  have  a  parametrized  family  of  dynamical  systems  on 

(Tq,  ne)ee©  is  a  parametrized  family  of  dynamical  systems  on  (X,  X)  with  corresponding 
observation  densities  (ge)eee-  If  certain  conditions  hold  (these  were  given  in  an  earlier 
report,  then  any  approximate  MLE  is  consistent  at  6q. 

3.2  Review  Article  on  Dynamical  Systems 

We  have  a  paper  udner  review  in  Statistical  Surveys  that  is  a  review  on  inference  in  dy¬ 
namical  systems: 

The  topic  of  statistical  inference  for  dynamical  systems  has  been  studied  widely  across 
several  fields.  In  this  survey  we  focus  on  methods  related  to  parameter  estimation  for 
nonlinear  dynamical  systems.  Our  objective  is  to  place  results  across  distinct  disciplines 
in  a  common  setting  and  highlight  opportunities  for  further  research. 

3.3  Sufficient  statistics  for  surfaces  and  shapes 

We  have  developed  methodology,  theory,  and  code  to  model  surfaces  and  shapes.  A  paper 
is  currently  under  review  in  Information  and  Inference. 

We  introduce  a  statistic,  the  persistent  homology  transform  (PHT),  to  model  surfaces 
in  R3  and  shapes  in  R2.  This  statistic  is  a  collection  of  persistence  diagrams  -  multiscale 
topological  summaries  used  extensively  in  topological  data  analysis.  We  use  the  PHT  to 
represent  shapes  and  execute  operations  such  as  computing  distances  between  shapes  or 
classifying  shapes.  We  prove  the  map  from  the  space  of  a  simplicial  complexes  in  R3  into 
the  space  spanned  by  this  statistic  is  injective.  This  implies  that  the  statistic  is  a  sufficient 
statistic  for  distributions  on  the  space  of  “smooth”  shapes.  We  also  show  that  a  variant 
of  this  statistic,  the  Euler  Characteristic  Transform  (ECT),  admits  a  simple  exponential 
family  formulation  which  is  of  use  in  providing  likelihood  based  inference  for  shapes  and 
surfaces.  We  illustrate  the  utility  of  this  statistic  on  simulated  and  real  data. 


6 


We  are  currently  using  this  method  for  a  variety  of  applications 


1.  Quantitative  and  statistical  genetics  of  shape  phenotypes. 

2.  Regression  models  of  behavioral  networks. 

3.  Evolutionary  models  of  shape  spaces  and  morphology. 

4.  Graph  isomorphism  problems. 

3.4  Dimension  reduction 

The  following  formulation  of  dimension  reduction  for  dynamical  systems  is  a  recent  project. 
Using  the  following  framework  we  are  able  to  integrate  classic  ideas  from  Bayesian  time 
series  models  such  as  Dynamic  Linear  Models  (DLMs)  with  classic  ideas  in  topological 
dynamics  such  as  Markov  partitions.  The  practical  utility  of  these  ideas  are  we  are  devel¬ 
oping  dimension  reduction  methods  for  both  continuous  space  dynamical  systems  as  well 
as  disctere  state  dynamical  systems. 

Given  a  dynamical  system  defined  by  the  following  (stochastic)  maps 


f-zt^  Zt+1 


g  ■■  zt-f  Yt. 

We  consider  the  following  dimension  reduction  procedure.  We  are  given  stochastic  maps 
ip  :  Zt  — >  Wt  where  W  is  much  lower  dimensional  than  Z  and  cp  :  Wf  — >  Zt  as  well  as  the 
dynamics  in  the  lower  dimension  q  :  Wt  — >  Wt+ 1- 

Given  the  above  set  of  maps  we  can  define  the  following  losses  to  define  the  goodness 
of  a  dimension  reduction  method  specified  by  (ip,  (p,g ): 

(1)  Prediction:  E [l(Yt  \  (Y*~\Zl),Yt  \  (Y^1,  W*))] 

(2)  Filtering:  D *L(p(Z*  |  |  W0Q). 
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